Genetic regression: Linear and Multilinear Genetic Regressions

Description

The regression aims at estimating genetic effects from a population in which the genotypes and phenotypes are known.

Usage

linearRegression(phen, gen=NULL, genZ=NULL, 
    reference="noia", max.level=NULL, max.dom=NULL, fast=FALSE)
multilinearRegression(phen, gen=NULL, genZ=NULL, 
    reference="noia", max.level=NULL, max.dom=NULL, fast=FALSE, 
    e.unique=FALSE, start.algo = "linear", start.values=NULL, 
    robust=FALSE, bilinear.steps=1, ...)

Value

linearRegression and multilinearRegression return an object of class "noia.linear" or "noia.multilinear", both having their own print methods: print.noia.linear and print.noia.multilinear.

Arguments

phen: The vector of individual phenotypes measured in the population.
gen: The matrix of individual genotypes in the population, one column per locus. See genNames for the genotype encoding. Not necessary if genZ is provided.
genZ: The matrix of individual genotypic probabilities in the population, 3 columns per locus, corresponding of the probability of each of the 3 genotypes (the sum must be 1). Not necessary if gen is provided.
reference: The reference point from which the regression is performed. By default, the "noia" reference point is used, since it provides a fairly good orthogonality. Other possibilities are "G2A", "F2", "F1", "Finf", "UWR", "P1" and "P2".
max.level: Maximum level of interactions.
max.dom: Maximum level for dominance effects. Does not have any effect if >= max.level. In the multilinear regression, the maximum level for dominance effects cannot be > 1.
fast: This "fast" algorithm should be used when (i) the number of loci is high (> 8) and (ii) there are uncertainties in the dataset (missing values or Haley-Knott regression). This algorithm computes the regression matrix directly function, i.e. without computing Z nor S matrices.
e.unique: Whether the multilinear term is the same for all pairs.
start.algo: Algorithm used to compute the starting values. Can be "linear", "multilinear", "subset" or "bilinear". Ignored if start.values are provided.
start.values: Vector of starting values.
robust: Tries sequentially all starting values algorithms.
bilinear.steps: Number of steps. Ignored if start.algo is not "bilinear". If NULL, the bilinear algorithm is run until (almost) convergence.
...: Extra parameters to the non-linear regression function nls, including nls.control.

Author

Arnaud Le Rouzic

Details

If a gen data set is provided, it will be turned into a genZ. Missing data (unknown genotypes) are considered as loci for which genotypic probabilities are identical to the genotypic frequencies in the population.

The algebraic framework is described extensively in Alvarez-Castro & Carlborg 2007. The default reference point ("noia") provides an orthogonal decomposition of genetic effects in the 1-locus case, whatever the genotypic frequencies. It remains a good approximation of orthogonality in the multi-locus case if linkage disequilibrium is small. Other optional reference points are those of the "G2A" model (Zeng et al. 2005), and the unweighted regression model "UWR" (Cheverud & Routman, 1995). Several key populations can be taken as reference as well: "F2", "F1", "Finf" (F infinity), and the two "parental" homozygous populations "P1" and "P2".

The multilinear model for genetic interactions is an alternative way to model epistatic interactions between at least two loci (see Hansen & Wagner 2001). The computation of multilinear estimates requires a non-linear regression step that relies on the nls function. Providing good starting values for the non-linear regression is a key to ensure convergence, and different algorithms are provided, that can be specified by the "start.algo" option. "linear" performs a linear regression and approximates the genetic effects from it, while "multilinear" performs a simpler multilinear regression (without dominance) to initialize the genetic effects. "subset" estimate all genetic effects from a random subset (50%) of the population, and "bilinear" estimate alternatively marginal and epistatic effects.

References

Alvarez-Castro JM, Carlborg O. (2007). A unified model for functional and statistical epistasis and its application in quantitative trait loci analysis. Genetics 176(2):1151-1167.

Alvarez-Castro JM, Le Rouzic A, Carlborg O. (2008). How to perform meaningful estimates of genetic effects. PLoS Genetics 4(5):e1000062.

Cheverud JM, Routman, EJ. (1995). Epistasis and its contribution to genetic variance components. Genetics 139:1455-1461.

Hansen TF, Wagner G. (2001) Modeling genetic architecture: A multilinear theory of gene interactions. Theoretical Population Biology 59:61-86.

Le Rouzic A, Alvarez-Castro JM. (2008). Estimation of genetic effects and genotype-phenotype maps. Evolutionary Bioinformatics 4.

Zeng ZB, Wang T, Zou W. (2005). Modelling quantitative trait loci and interpretation of models. Genetics 169: 1711-1725.

Examples

Run this code

set.seed(123456789)

map <- c(0.25, -0.75, -0.75, -0.75, 2.25, 2.25, -0.75, 2.25, 2.25)
pop <- simulatePop(map, N=500, sigmaE=0.2, type="F2")

# Regressions

linear <- linearRegression(phen=pop$phen, gen=cbind(pop$Loc1, pop$Loc2))

multilinear <- multilinearRegression(phen=pop$phen, 
    gen=cbind(pop$Loc1, pop$Loc2))

# Linear effects, associated variances and stderr
linear

# Multilinear effects
multilinear

Run the code above in your browser using DataLab